Exploiting Timelines to Enhance Multi-document Summarization
نویسندگان
چکیده
We study the use of temporal information in the form of timelines to enhance multidocument summarization. We employ a fully automated temporal processing system to generate a timeline for each input document. We derive three features from these timelines, and show that their use in supervised summarization lead to a significant 4.1% improvement in ROUGE performance over a state-of-the-art baseline. In addition, we propose TIMEMMR, a modification to Maximal Marginal Relevance that promotes temporal diversity by way of computing time span similarity, and show its utility in summarizing certain document sets. We also propose a filtering metric to discard noisy timelines generated by our automatic processes, to purify the timeline input for summarization. By selectively using timelines guided by filtering, overall summarization performance is increased by a significant 5.9%.
منابع مشابه
Hierarchical Summarization: Scaling Up Multi-Document Summarization
Multi-document summarization (MDS) systems have been designed for short, unstructured summaries of 10-15 documents, and are inadequate for larger document collections. We propose a new approach to scaling up summarization called hierarchical summarization, and present the first implemented system, SUMMA. SUMMA produces a hierarchy of relatively short summaries, in which the top level provides a...
متن کاملInterpreting Time in Text Summarizing Text with Time
In this thesis, I study two key steps in building a logical representation of temporal information — a timeline — found within text from newswire articles: 1) intra-sentence event-timex (E-T ) temporal relationship classification, and 2) article-wide event-event (E-E ) temporal relationship classification. Events and time expressions (timexes) are basic units of temporal information in text. Th...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملExploiting Category-Specific Information for Multi-Document Summarization
We show that by making use of information common to document sets belonging to a common category, we can improve the quality of automatically extracted content in multi-document summaries. This simple property is widely applicable in multi-document summarization tasks, and can be encapsulated by the concept of category-specific importance (CSI). Our experiments show that CSI is a valuable metri...
متن کاملEmpirical analysis of exploiting review helpfulness for extractive summarization of online reviews
We propose a novel unsupervised extractive approach for summarizing online reviews by exploiting review helpfulness ratings. In addition to using the helpfulness ratings for review-level filtering, we suggest using them as the supervision of a topic model for sentence-level content scoring. The proposed method is metadata-driven, requiring no human annotation, and generalizable to different kin...
متن کامل